31 research outputs found
Brook GLES Pi: democratising accelerator programming
Nowadays computing is heavily-based on accelerators, however, the cost of the hardware equipment prevents equal access to heterogeneous programming. In this work we present Brook GLES Pi, a port of the accelerator programming language Brook. Our solution, primarily focused on the educational platform Raspberry Pi, allows to teach, experiment and take advantage of heterogeneous programming on any low-cost embedded device featuring an OpenGL ES 2 GPU, democratising access to accelerator programming.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence.Peer ReviewedPostprint (author's final draft
BaseSAFE: Baseband SAnitized Fuzzing through Emulation
Rogue base stations are an effective attack vector. Cellular basebands
represent a critical part of the smartphone's security: they parse large
amounts of data even before authentication. They can, therefore, grant an
attacker a very stealthy way to gather information about calls placed and even
to escalate to the main operating system, over-the-air. In this paper, we
discuss a novel cellular fuzzing framework that aims to help security
researchers find critical bugs in cellular basebands and similar embedded
systems. BaseSAFE allows partial rehosting of cellular basebands for fast
instrumented fuzzing off-device, even for closed-source firmware blobs.
BaseSAFE's sanitizing drop-in allocator, enables spotting heap-based
buffer-overflows quickly. Using our proof-of-concept harness, we fuzzed various
parsers of the Nucleus RTOS-based MediaTek cellular baseband that are
accessible from rogue base stations. The emulator instrumentation is highly
optimized, reaching hundreds of executions per second on each core for our
complex test case, around 15k test-cases per second in total. Furthermore, we
discuss attack vectors for baseband modems. To the best of our knowledge, this
is the first use of emulation-based fuzzing for security testing of commercial
cellular basebands. Most of the tooling and approaches of BaseSAFE are also
applicable for other low-level kernels and firmware. Using BaseSAFE, we were
able to find memory corruptions including heap out-of-bounds writes using our
proof-of-concept fuzzing harness in the MediaTek cellular baseband. BaseSAFE,
the harness, and a large collection of LTE signaling message test cases will be
released open-source upon publication of this paper
Fast and portable vector dsp simulation through automatic vectorization
\u3cp\u3eVector DSPs are quite common in embedded SoCs used in compute-intensive domains such as imaging and wireless communication. To achieve short time-to-market, it is crucial to provide system architects and SW developers with fast and accurate instruction set simulators of such DSPs. To this end, a methodology for accelerating the simulation of vector instructions in vector DSPs is proposed. The acceleration is achieved by enabling automatic translation of the vector instructions in a given vector DSP binary into host SIMD instructions. The key advantage of the proposed methodology is its independence from the host architecture. Empirical evaluation, using a set of commercial vector DSPs, shows that the proposed methodology provides a 4x average reduction in simulation time of a vector instruction and a 2x average reduction in simulation time of a whole application.\u3c/p\u3
Harissa: A flexible and efficient Java environment mixing bytecode and compiled code
The Java language provides a promising solution to the design of safe programs, with an application spectrum ranging from Web services to operating system components. The well-known tradeoff of Java's portability is the inefficiency of its basic execution model, which relies on the interpretation of an object-based virtual machine. Many solutions have been proposed to overcome this problem, such as just-in-time (JIT) and off-line bytecode compilers. However, most compilers trade efficiency for either portability or the ability to dynamically load bytecode. In this paper, we present an approach which reconciles portability and efficiency, and preserves the ability to dynamically load bytecode. We have designed and implemented an efficient environment for the execution of Java programs, named Harissa. Harissa permits the mixing of compiled and interpreted methods. Harissa's compiler translates Java bytecode to C, incorporating aggressive optimizations such as virtual method call optimization based on the Clas
Arbitrary and Variable Precision Floating-Point Arithmetic Support in Dynamic Binary Translation
International audienceFloating-point hardware support has more or less been settled 35 years ago by the adoption of the IEEE 754 standard. However, many scientific applications require higher accuracy than what can be represented on 64 bits, and to that end make use of dedicated arbitrary precision software libraries. To reach a good performance/accuracy trade-off, developers use variable precision, requiring e.g. more accuracy as the computation progresses. Hardware accelerators for this kind of computations do not exist yet, and independently of the actual quality of the underlying arithmetic computations, defining the right instruction set architecture, memory representations, etc, for them is a challenging task. We investigate in this paper the support for arbitrary and variable precision arithmetic in a dynamic binary translator, to help gain an insight of what such an accelerator could provide as an interface to compilers, and thus programmers. We detail our design and present an implementation in QEMU using the MPRF library for the RISCV processo
A fast analytical model of fully associative caches
While the cost of computation is an easy to understand local property, the
cost of data movement on cached architectures depends on global state, does not
compose, and is hard to predict. As a result, programmers often fail to
consider the cost of data movement. Existing cache models and simulators
provide the missing information but are computationally expensive. We present a
lightweight cache model for fully associative caches with least recently used
(LRU) replacement policy that gives fast and accurate results. We count the
cache misses without explicit enumeration of all memory accesses by using
symbolic counting techniques twice: 1) to derive the stack distance for each
memory access and 2) to count the memory accesses with stack distance larger
than the cache size. While this technique seems infeasible in theory, due to
non-linearities after the first round of counting, we show that the counting
problems are sufficiently linear in practice. Our cache model often computes
the results within seconds and contrary to simulation the execution time is
mostly problem size independent. Our evaluation measures modeling errors below
0.6% on real hardware. By providing accurate data placement information we
enable memory hierarchy aware software development.Comment: 14 pages, 16 figures, PLDI1